What Kind of Information is Necessary for NLP and MT?
نویسنده
چکیده
Researchers in natural language processing (NLP) and machine translation (MT) in the past were mainly interested in linguistic theories, parsing and generation. They discussed specific types of language expressions such as garden path sentences, and did not pay much attention to problems v0hich may arise when huge volmne of real existing sentences are handled, such as newspaper articles, patent documents, and so on. Once we get into this area of processing large text corpus, we confront with several problems such as: we have to write a complete set of grammatical rules, we have to prepare a comprehensive dictionary, and so on. A major probleln here is not in the linguistically interesting but seldom-appearing linguistic phenomena, but in the average success rate of parsing, generation etc. for a large text corpus, which seldom includes such sophisticated sentential structures as garden path sentences. It includes different types of difficult problems, for example, parsing of long sentences such as sentences COlnposed of more than thirty words, and building a good lexicon. In the university researches where new idea is most important, building a good complete lexicon has not been a central issue. At companies on the contrary where COlnmercial machine translation systems are commercialized, people are forced to construct a complete dictionary which includes all the common words and sufficient number of terminology words in several specific fields. Dictionary construction takes a long time and a big money. It has to have a consistency in the whole dictionary, and the quality of the contents must be uniform for all the words. Japanese companies spent a lot of money for the dictionaries for machine translation. From this bitter experience they got together to construct a basic electronic dictionary which they can share themselves by the help of the Japanese Government.
منابع مشابه
What should we do next for MT system development?
Machine translation (MT) research and development began at the end of 1950’s when not only natural language processing (NLP) technology but also linguistic theory was at a primitive level. Given the restricted memory sizes and computing power at that time, MT presented one of the most difficult and challenging research themes of the day. Thus. MT researchers and developers were forgiven when th...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملThe Lexicon and MT: a position paper
The recent trend towards developing the lexical component of NLP systems has focussed attention on two potentially valuable sources of lexical data: printed dictionaries for humans and large text corpora. This presentation considers the types of information that might be required by MT researchers and the extent to which this information can be derived from these two sources. This raises a numb...
متن کاملبررسی مسؤولیت ناشی از اخفای مضرات دارو و نقش قاعده تحذیر
Background and aim: One of the the most vital consumer goods according to the requires of human society and the natural medicines that may have complications and risks for consumers As well as medicines and medical equipment are today of particular complexity.So full advantage of them requires consumer training to will not suffered an accident and to appropriately benefit from the drug. Therefo...
متن کاملThinking as Evidence for the Probability of the Existence of a God: An Argument from Unnaturalness for Necessity
The objective of this article is to show that it is justified to assert that the existence of God is plausible, considering the fact that thinking itself is an immediate outcome (effect) of a thinker (cause). This idea may seem evident, but it is in fact challenged by certain claims of cognitive philosophers who aver that our knowledge of necessity and causation is, i...
متن کاملDiscourse and Document-level Information for Evaluating Language Output Tasks
Evaluating the quality of language output tasks such as Machine Translation (MT) and Automatic Summarisation (AS) is a challenging topic in Natural Language Processing (NLP). Recently, techniques focusing only on the use of outputs of the systems and source information have been investigated. In MT, this is referred to as Quality Estimation (QE), an approach that uses machine learning technique...
متن کامل